Using Transformers Pipeline for Quickly Solving NLP tasks

Tanvi Dadumanu_456

DURATION

5 min

Tags

NLP

Sentiment Analysis

Question Answering

COMPETITIVE PROGRAMMING AT TOPCODER

Implementing state-of-the-art models for the task of text classification looks like a daunting task, requiring vast amounts of computation power and mathematical rigor. However, the widely used transformers library in Pytorch for training BERT and BERT-like transformers-based models released a pipelines API feature, which allows anyone to create a working model with cutting-edge performance.

Launched in December 2019, the pipeline API allows developers and data scientists alike to use competitive models for various downstream tasks including question answering, sentiment analysis and named entity recognition. These models include ready-to-use, already fine tuned models that perform well for the task.

In this blog post, we explore three applications of the pipeline API: sentiment analysis, question answering, and named entity recognition.

Sentiment Analysis

For the task of getting the sentiment of any sentence or document, we use the ‘sentiment-analysis’ pipeline, which enables us to use the recently released DistilBert model fine tuned on SST-2 dataset (a benchmark dataset for sentiment analysis).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from transformers
import pipeline

# Initialise Sentiment Analysis Pipeline(Bert large finetuned on SQuAD 1.0)
nlp = pipeline('sentiment-analysis')

# Statement to be predicted the sentiment of
  statement = "The weather is not that good, and everything else looks pretty chill, though."

# Pipeline of 'sentiment-analysis'
takes a string as an input
output = nlp(statement)

# Printing the Result
print(output)

The given code outputs a dictionary with the following keys:-

‘label’: The sentiment of the statement represented by its respective label.
‘score’: The confidence with which the model has predicted the given label.

Question Answering

For the task of answering a question given some context, using the following code enables us to use a Google BERT(2018) variant: Bert Large whole-word version fine tuned on SQuAD 1.0.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from transformers
import pipeline

# Question - answering pipeline(Bert large finetuned on SQuAD 1.0)
nlp = pipeline('question-answering')

# Question to be asked to the pipeline
question = "Where is New Delhi?"

# Context, with respect to which the question is to be asked
context = "New Delhi is the capital of India."

# Passing the Question and Context into the QA pipeline
output = nlp(question = question, context = context)

# Printing the Output
print(output)

The given code outputs a dictionary with the following keys:

‘answer’: The answer to the question as a string.
‘score’: The confidence with which the model has predicted the given answer.
‘start’: Start position of the answer in the context.
‘end’: End position of the answer in the context.

Named Entity Recognition

For the task of recognizing named entities within a given document, we use the Facebook AI research model XLM-R, fine tuned on the CoNLL2003 dataset, which achieves state-of-the-art performance represented by an F1 of 88.7(Reference).

1
2
3
4
5
6
7
8
9
10
11
12
13
from transformers
import pipeline
# Named Entity Recognition Pipeline(XLM - R finetuned by @stefan - it on CoNLL03 English)
nlp = pipeline('ner', model = 'xlm-roberta-large-finetuned-conll03-english')

# Statement in which named entities are to be found
statement = "Topcoder makes me happy."

# Passing the statement to the NER pipeline
output = nlp(statement)

# Printing the output
print(output)

The given code outputs a list of named entities in the following format:

Word: The named entity, as a string.
Score: The confidence with which the model has predicted the given named entity.
Entity: The type of named entity. For example, ‘I-PER representing person named entity and I-LOC’ representing location named entity.

This article shows that models competitive to state of the art can be utilized efficiently using the Pytorch-based transformers library, which not only helps in the creation of prototype models for various NLP tasks but also implements them at industry scale with ease.

Chat on Discord

December 29, 2019

Using Transformers Pipeline for Quickly Solving NLP tasks

DURATION

categories

Tags

share

COMPETITIVE PROGRAMMING AT TOPCODER

Sentiment Analysis

Question Answering

Named Entity Recognition